Revisiting the Impact of Different Annotation Schemes on PCFG Parsing: A Grammatical Dependency Evaluation

نویسندگان

  • Adriane Boyd
  • Detmar Meurers
چکیده

Recent parsing research has started addressing the questions a) how parsers trained on different syntactic resources differ in their performance and b) how to conduct a meaningful evaluation of the parsing results across such a range of syntactic representations. Two German treebanks, Negra and TüBa-D/Z, constitute an interesting testing ground for such research given that the two treebanks make very different representational choices for this language, which also is of general interest given that German is situated between the extremes of fixed and free word order. We show that previous work comparing PCFG parsing with these two treebanks employed PARSEVAL and grammatical function comparisons which were skewed by differences between the two corpus annotation schemes. Focusing on the grammatical dependency triples as an essential dimension of comparison, we show that the two very distinct corpora result in comparable parsing performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Evaluation Measures

This paper presents a thorough examination of the validity of three evaluation measures on parser output. We assess parser performance of an unlexicalised probabilistic parser trained on two German treebanks with different annotation schemes and evaluate parsing results using the PARSEVAL metric, the Leaf-Ancestor metric and a dependency-based evaluation. We reject the claim that the TüBa-D/Z a...

متن کامل

Treebank-Based Grammar Acquisition for German

Manual development of deep linguistic resources is time-consuming and costly and therefore often described as a bottleneck for traditional rule-based NLP. In my PhD thesis I present a treebank-based method for the automatic acquisition of LFG resources for German. The method automatically creates deep and rich linguistic representations from labelled data (treebanks) and can be applied to large...

متن کامل

A Testsuite for Testing Parser Performance on Complex German Grammatical Constructions

Traditionally, parsers are evaluated against gold standard test data. This can cause problems if there is a mismatch between the data structures and representations used by the parser and the gold standard. A particular case in point is German, for which two treebanks (TiGer and TüBa-D/Z) are available with highly different annotation schemes for the acquisition of (e.g.) PCFG parsers. The diff...

متن کامل

How to Compare Treebanks

Recent years have seen an increasing interest in developing standards for linguistic annotation, with a focus on the interoperability of the resources. This effort, however, requires a profound knowledge of the advantages and disadvantages of linguistic annotation schemes in order to avoid importing the flaws and weaknesses of existing encoding schemes into the new standards. This paper address...

متن کامل

Why is it so difficult to compare treebanks? TIGER and TüBa-D/Z revisited

This paper is a contribution to the ongoing discussion on treebank annotation schemes and their impact on PCFG parsing results. We provide a thorough comparison of two German treebanks: the TIGER treebank and the TüBa-D/Z. We use simple statistics on sentence length and vocabulary size, and more refined methods such as perplexity and its correlation with PCFG parsing results, as well as a Princ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008